home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Windows Expert
/
Windows Expert.iso
/
windownt
/
awksrc.zip
/
GAWK-D~1.14
/
GAWK~7.INF
(
.txt
)
< prev
next >
Wrap
GNU Info File
|
1993-10-03
|
50KB
|
927 lines
This is Info file gawk.info, produced by Makeinfo-1.47 from the input
file gawk.texi.
This file documents `awk', a program that you can use to select
particular records in a file and perform operations upon them.
This is Edition 0.14 of `The GAWK Manual',
for the 2.14 version of the GNU implementation
of AWK.
Copyright (C) 1989, 1991, 1992 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.
File: gawk.info, Node: Time Functions, Prev: I/O Functions, Up: Built-in
Functions for Dealing with Time Stamps
======================================
A common use for `awk' programs is the processing of log files. Log
files often contain time stamp information, indicating when a
particular log record was written. Many programs log their time stamp
in the form returned by the `time' system call, which is the number of
seconds since a particular epoch. On POSIX systems, it is the number
of seconds since Midnight, January 1, 1970, UTC.
In order to make it easier to process such log files, and to easily
produce useful reports, `gawk' provides two functions for working with
time stamps. Both of these are `gawk' extensions; they are not
specified in the POSIX standard, nor are they in any other known version
of `awk'.
`systime()'
This function returns the current time as the number of seconds
since the system epoch. On POSIX systems, this is the number of
seconds since Midnight, January 1, 1970, UTC. It may be a
different number on other systems.
`strftime(FORMAT, TIMESTAMP)'
This function returns a string. It is similar to the function of
the same name in the ANSI C standard library. The time specified
by TIMESTAMP is used to produce a string, based on the contents of
the FORMAT string.
The `systime' function allows you to compare a time stamp from a log
file with the current time of day. In particular, it is easy to
determine how long ago a particular record was logged. It also allows
you to produce log records using the "seconds since the epoch" format.
The `strftime' function allows you to easily turn a time stamp into
human-readable information. It is similar in nature to the `sprintf'
function, copying non-format specification characters verbatim to the
returned string, and substituting date and time values for format
specifications in the FORMAT string. If no TIMESTAMP argument is
supplied, `gawk' will use the current time of day as the time stamp.
`strftime' is guaranteed by the ANSI C standard to support the
following date format specifications:
The locale's abbreviated weekday name.
The locale's full weekday name.
The locale's abbreviated month name.
The locale's full month name.
The locale's "appropriate" date and time representation.
The day of the month as a decimal number (01--31).
The hour (24-hour clock) as a decimal number (00--23).
The hour (12-hour clock) as a decimal number (01--12).
The day of the year as a decimal number (001--366).
The month as a decimal number (01--12).
The minute as a decimal number (00--59).
The locale's equivalent of the AM/PM designations associated with
a 12-hour clock.
The second as a decimal number (00--61). (Occasionally there are
minutes in a year with one or two leap seconds, which is why the
seconds can go from 0 all the way to 61.)
The week number of the year (the first Sunday as the first day of
week 1) as a decimal number (00--53).
The weekday as a decimal number (0--6). Sunday is day 0.
The week number of the year (the first Monday as the first day of
week 1) as a decimal number (00--53).
The locale's "appropriate" date representation.
The locale's "appropriate" time representation.
The year without century as a decimal number (00--99).
The year with century as a decimal number.
The time zone name or abbreviation, or no characters if no time
zone is determinable.
A literal `%'.
If a conversion specifier is not one of the above, the behavior is
undefined. (1)
Informally, a "locale" is the geographic place in which a program is
meant to run. For example, a common way to abbreviate the date
September 4, 1991 in the United States would be "9/4/91". In many
countries in Europe, however, it would be abbreviated "4.9.91". Thus,
the `%x' specification in a `"US"' locale might produce `9/4/91', while
in a `"EUROPE"' locale, it might produce `4.9.91'. The ANSI C standard
defines a default `"C"' locale, which is an environment that is typical
of what most C programmers are used to.
A public-domain C version of `strftime' is shipped with `gawk' for
systems that are not yet fully ANSI-compliant. If that version is used
to compile `gawk' (*note Installing `gawk': Installation.), then the
following additional format specifications are available:
Equivalent to specifying `%m/%d/%y'.
The day of the month, padded with a blank if it is only one digit.
Equivalent to `%b', above.
A newline character (ASCII LF).
Equivalent to specifying `%I:%M:%S %p'.
Equivalent to specifying `%H:%M'.
Equivalent to specifying `%H:%M:%S'.
A TAB character.
The century, as a number between 00 and 99.
is replaced by the weekday as a decimal number [1 (Monday)--7].
is replaced by the week number of the year (the first Monday as
the first day of week 1) as a decimal number (01--53). The method
for determining the week number is as specified by ISO 8601 (to
wit: if the week containing January 1 has four or more days in the
new year, then it is week 1, otherwise it is week 53 of the
previous year and the next week is week 1).
`%Ec %EC %Ex %Ey %EY %Od %Oe %OH %OI'
`%Om %OM %OS %Ou %OU %OV %Ow %OW %Oy'
These are "alternate representations" for the specifications that
use only the second letter (`%c', `%C', and so on). They are
recognized, but their normal representations are used. (These
facilitate compliance with the POSIX `date' utility.)
The date in VMS format (e.g. 20-JUN-1991).
Here are two examples that use `strftime'. The first is an `awk'
version of the C `ctime' function. (This is a user defined function,
which we have not discussed yet. *Note User-defined Functions:
User-defined, for more information.)
# ctime.awk
#
# awk version of C ctime(3) function
function ctime(ts, format)
{
format = "%a %b %e %H:%M:%S %Z %Y"
if (ts == 0)
ts = systime() # use current time as default
return strftime(format, ts)
}
This next example is an `awk' implementation of the POSIX `date'
utility. Normally, the `date' utility prints the current date and time
of day in a well known format. However, if you provide an argument to
it that begins with a `+', `date' will copy non-format specifier
characters to the standard output, and will interpret the current time
according to the format specifiers in the string. For example:
date '+Today is %A, %B %d, %Y.'
might print
Today is Thursday, July 11, 1991.
Here is the `awk' version of the `date' utility.
#! /usr/bin/gawk -f
#
# date --- implement the P1003.2 Draft 11 'date' command
#
# Bug: does not recognize the -u argument.
BEGIN \
{
format = "%a %b %e %H:%M:%S %Z %Y"
exitval = 0
if (ARGC > 2)
exitval = 1
else if (ARGC == 2) {
format = ARGV[1]
if (format ~ /^\+/)
format = substr(format, 2) # remove leading +
}
print strftime(format)
exit exitval
}
---------- Footnotes ----------
(1) This is because the ANSI standard for C leaves the behavior of
the C version of `strftime' undefined, and `gawk' will use the system's
version of `strftime' if it's there. Typically, the conversion
specifier will either not appear in the returned string, or it will
appear literally.
File: gawk.info, Node: User-defined, Next: Built-in Variables, Prev: Built-in, Up: Top
User-defined Functions
**********************
Complicated `awk' programs can often be simplified by defining your
own functions. User-defined functions can be called just like built-in
ones (*note Function Calls::.), but it is up to you to define them--to
tell `awk' what they should do.
* Menu:
* Definition Syntax:: How to write definitions and what they mean.
* Function Example:: An example function definition and
what it does.
* Function Caveats:: Things to watch out for.
* Return Statement:: Specifying the value a function returns.
File: gawk.info, Node: Definition Syntax, Next: Function Example, Prev: User-defined, Up: User-defined
Syntax of Function Definitions
==============================
Definitions of functions can appear anywhere between the rules of the
`awk' program. Thus, the general form of an `awk' program is extended
to include sequences of rules *and* user-defined function definitions.
The definition of a function named NAME looks like this:
function NAME (PARAMETER-LIST) {
BODY-OF-FUNCTION
}
NAME is the name of the function to be defined. A valid function name
is like a valid variable name: a sequence of letters, digits and
underscores, not starting with a digit. Functions share the same pool
of names as variables and arrays.
PARAMETER-LIST is a list of the function's arguments and local
variable names, separated by commas. When the function is called, the
argument names are used to hold the argument values given in the call.
The local variables are initialized to the null string.
The BODY-OF-FUNCTION consists of `awk' statements. It is the most
important part of the definition, because it says what the function
should actually *do*. The argument names exist to give the body a way
to talk about the arguments; local variables, to give the body places
to keep temporary values.
Argument names are not distinguished syntactically from local
variable names; instead, the number of arguments supplied when the
function is called determines how many argument variables there are.
Thus, if three argument values are given, the first three names in
PARAMETER-LIST are arguments, and the rest are local variables.
It follows that if the number of arguments is not the same in all
calls to the function, some of the names in PARAMETER-LIST may be
arguments on some occasions and local variables on others. Another way
to think of this is that omitted arguments default to the null string.
Usually when you write a function you know how many names you intend
to use for arguments and how many you intend to use as locals. By
convention, you should write an extra space between the arguments and
the locals, so other people can follow how your function is supposed to
be used.
During execution of the function body, the arguments and local
variable values hide or "shadow" any variables of the same names used
in the rest of the program. The shadowed variables are not accessible
in the function definition, because there is no way to name them while
their names have been taken away for the local variables. All other
variables used in the `awk' program can be referenced or set normally
in the function definition.
The arguments and local variables last only as long as the function
body is executing. Once the body finishes, the shadowed variables come
back.
The function body can contain expressions which call functions. They
can even call this function, either directly or by way of another
function. When this happens, we say the function is "recursive".
There is no need in `awk' to put the definition of a function before
all uses of the function. This is because `awk' reads the entire
program before starting to execute any of it.
In many `awk' implementations, the keyword `function' may be
abbreviated `func'. However, POSIX only specifies the use of the
keyword `function'. This actually has some practical implications. If
`gawk' is in POSIX-compatibility mode (*note Invoking `awk': Command
Line.), then the following statement will *not* define a function:
func foo() { a = sqrt($1) ; print a }
Instead it defines a rule that, for each record, concatenates the value
of the variable `func' with the return value of the function `foo', and
based on the truth value of the result, executes the corresponding
action. This is probably not what was desired. (`awk' accepts this
input as syntactically valid, since functions may be used before they
are defined in `awk' programs.)
File: gawk.info, Node: Function Example, Next: Function Caveats, Prev: Definition Syntax, Up: User-defined
Function Definition Example
===========================
Here is an example of a user-defined function, called `myprint', that
takes a number and prints it in a specific format.
function myprint(num)
{
printf "%6.3g\n", num
}
To illustrate, here is an `awk' rule which uses our `myprint' function:
$3 > 0 { myprint($3) }
This program prints, in our special format, all the third fields that
contain a positive number in our input. Therefore, when given:
1.2 3.4 5.6 7.8
9.10 11.12 -13.14 15.16
17.18 19.20 21.22 23.24
this program, using our function to format the results, prints:
5.6
21.2
Here is a rather contrived example of a recursive function. It
prints a string backwards:
function rev (str, len) {
if (len == 0) {
printf "\n"
return
}
printf "%c", substr(str, len, 1)
rev(str, len - 1)
}
File: gawk.info, Node: Function Caveats, Next: Return Statement, Prev: Function Example, Up: User-defined
Calling User-defined Functions
==============================
"Calling a function" means causing the function to run and do its
job. A function call is an expression, and its value is the value
returned by the function.
A function call consists of the function name followed by the
arguments in parentheses. What you write in the call for the arguments
are `awk' expressions; each time the call is executed, these
expressions are evaluated, and the values are the actual arguments. For
example, here is a call to `foo' with three arguments (the first being
a string concatenation):
foo(x y, "lose", 4 * z)
*Caution:* whitespace characters (spaces and tabs) are not allowed
between the function name and the open-parenthesis of the argument
list. If you write whitespace by mistake, `awk' might think that
you mean to concatenate a variable with an expression in
parentheses. However, it notices that you used a function name
and not a variable name, and reports an error.
When a function is called, it is given a *copy* of the values of its
arguments. This is called "call by value". The caller may use a
variable as the expression for the argument, but the called function
does not know this: it only knows what value the argument had. For
example, if you write this code:
foo = "bar"
z = myfunc(foo)
then you should not think of the argument to `myfunc' as being "the
variable `foo'." Instead, think of the argument as the string value,
`"bar"'.
If the function `myfunc' alters the values of its local variables,
this has no effect on any other variables. In particular, if `myfunc'
does this:
function myfunc (win) {
print win
win = "zzz"
print win
}
to change its first argument variable `win', this *does not* change the
value of `foo' in the caller. The role of `foo' in calling `myfunc'
ended when its value, `"bar"', was computed. If `win' also exists
outside of `myfunc', the function body cannot alter this outer value,
because it is shadowed during the execution of `myfunc' and cannot be
seen or changed from there.
However, when arrays are the parameters to functions, they are *not*
copied. Instead, the array itself is made available for direct
manipulation by the function. This is usually called "call by
reference". Changes made to an array parameter inside the body of a
function *are* visible outside that function. This can be *very*
dangerous if you do not watch what you are doing. For example:
function changeit (array, ind, nvalue) {
array[ind] = nvalue
}
BEGIN {
a[1] = 1 ; a[2] = 2 ; a[3] = 3
changeit(a, 2, "two")
printf "a[1] = %s, a[2] = %s, a[3] = %s\n", a[1], a[2], a[3]
}
prints `a[1] = 1, a[2] = two, a[3] = 3', because calling `changeit'
stores `"two"' in the second element of `a'.
File: gawk.info, Node: Return Statement, Prev: Function Caveats, Up: User-defined
The `return' Statement
======================
The body of a user-defined function can contain a `return' statement.
This statement returns control to the rest of the `awk' program. It
can also be used to return a value for use in the rest of the `awk'
program. It looks like this:
return EXPRESSION
The EXPRESSION part is optional. If it is omitted, then the returned
value is undefined and, therefore, unpredictable.
A `return' statement with no value expression is assumed at the end
of every function definition. So if control reaches the end of the
function body, then the function returns an unpredictable value. `awk'
will not warn you if you use the return value of such a function; you
will simply get unpredictable or unexpected results.
Here is an example of a user-defined function that returns a value
for the largest number among the elements of an array:
function maxelt (vec, i, ret) {
for (i in vec) {
if (ret == "" || vec[i] > ret)
ret = vec[i]
}
return ret
}
You call `maxelt' with one argument, which is an array name. The local
variables `i' and `ret' are not intended to be arguments; while there
is nothing to stop you from passing two or three arguments to `maxelt',
the results would be strange. The extra space before `i' in the
function parameter list is to indicate that `i' and `ret' are not
supposed to be arguments. This is a convention which you should follow
when you define functions.
Here is a program that uses our `maxelt' function. It loads an
array, calls `maxelt', and then reports the maximum number in that
array:
awk '
function maxelt (vec, i, ret) {
for (i in vec) {
if (ret == "" || vec[i] > ret)
ret = vec[i]
}
return ret
}
# Load all fields of each record into nums.
{
for(i = 1; i <= NF; i++)
nums[NR, i] = $i
}
END {
print maxelt(nums)
}'
Given the following input:
1 5 23 8 16
44 3 5 2 8 26
256 291 1396 2962 100
-6 467 998 1101
99385 11 0 225
our program tells us (predictably) that:
99385
is the largest number in our array.
File: gawk.info, Node: Built-in Variables, Next: Command Line, Prev: User-defined, Up: Top
Built-in Variables
******************
Most `awk' variables are available for you to use for your own
purposes; they never change except when your program assigns values to
them, and never affect anything except when your program examines them.
A few variables have special built-in meanings. Some of them `awk'
examines automatically, so that they enable you to tell `awk' how to do
certain things. Others are set automatically by `awk', so that they
carry information from the internal workings of `awk' to your program.
This chapter documents all the built-in variables of `gawk'. Most
of them are also documented in the chapters where their areas of
activity are described.
* Menu:
* User-modified:: Built-in variables that you change
to control `awk'.
* Auto-set:: Built-in variables where `awk'
gives you information.
File: gawk.info, Node: User-modified, Next: Auto-set, Prev: Built-in Variables, Up: Built-in Variables
Built-in Variables that Control `awk'
=====================================
This is a list of the variables which you can change to control how
`awk' does certain things.
`CONVFMT'
This string is used by `awk' to control conversion of numbers to
strings (*note Conversion of Strings and Numbers: Conversion.). It
works by being passed, in effect, as the first argument to the
`sprintf' function. Its default value is `"%.6g"'. `CONVFMT' was
introduced by the POSIX standard.
`FIELDWIDTHS'
This is a space separated list of columns that tells `gawk' how to
manage input with fixed, columnar boundaries. It is an
experimental feature that is still evolving. Assigning to
`FIELDWIDTHS' overrides the use of `FS' for field splitting. *Note
Reading Fixed-width Data: Constant Size, for more information.
If `gawk' is in compatibility mode (*note Invoking `awk': Command
Line.), then `FIELDWIDTHS' has no special meaning, and field
splitting operations are done based exclusively on the value of
`FS'.
`FS' is the input field separator (*note Specifying how Fields are
Separated: Field Separators.). The value is a single-character
string or a multi-character regular expression that matches the
separations between fields in an input record.
The default value is `" "', a string consisting of a single space.
As a special exception, this value actually means that any
sequence of spaces and tabs is a single separator. It also causes
spaces and tabs at the beginning or end of a line to be ignored.
You can set the value of `FS' on the command line using the `-F'
option:
awk -F, 'PROGRAM' INPUT-FILES
If `gawk' is using `FIELDWIDTHS' for field-splitting, assigning a
value to `FS' will cause `gawk' to return to the normal,
regexp-based, field splitting.
`IGNORECASE'
If `IGNORECASE' is nonzero, then *all* regular expression matching
is done in a case-independent fashion. In particular, regexp
matching with `~' and `!~', and the `gsub' `index', `match',
`split' and `sub' functions all ignore case when doing their
particular regexp operations. *Note:* since field splitting with
the value of the `FS' variable is also a regular expression
operation, that too is done with case ignored. *Note
Case-sensitivity in Matching: Case-sensitivity.
If `gawk' is in compatibility mode (*note Invoking `awk': Command
Line.), then `IGNORECASE' has no special meaning, and regexp
operations are always case-sensitive.
`OFMT'
This string is used by `awk' to control conversion of numbers to
strings (*note Conversion of Strings and Numbers: Conversion.) for
printing with the `print' statement. It works by being passed, in
effect, as the first argument to the `sprintf' function. Its
default value is `"%.6g"'. Earlier versions of `awk' also used
`OFMT' to specify the format for converting numbers to strings in
general expressions; this has been taken over by `CONVFMT'.
`OFS'
This is the output field separator (*note Output Separators::.).
It is output between the fields output by a `print' statement. Its
default value is `" "', a string consisting of a single space.
`ORS'
This is the output record separator. It is output at the end of
every `print' statement. Its default value is a string containing
a single newline character, which could be written as `"\n"'.
(*Note Output Separators::.)
This is `awk''s input record separator. Its default value is a
string containing a single newline character, which means that an
input record consists of a single line of text. (*Note How Input
is Split into Records: Records.)
`SUBSEP'
`SUBSEP' is the subscript separator. It has the default value of
`"\034"', and is used to separate the parts of the name of a
multi-dimensional array. Thus, if you access `foo[12,3]', it
really accesses `foo["12\0343"]' (*note Multi-dimensional Arrays:
Multi-dimensional.).
File: gawk.info, Node: Auto-set, Prev: User-modified, Up: Built-in Variables
Built-in Variables that Convey Information
==========================================
This is a list of the variables that are set automatically by `awk'
on certain occasions so as to provide information to your program.
`ARGC'
`ARGV'
The command-line arguments available to `awk' programs are stored
in an array called `ARGV'. `ARGC' is the number of command-line
arguments present. *Note Invoking `awk': Command Line. `ARGV' is
indexed from zero to `ARGC - 1'. For example:
awk 'BEGIN { for (i = 0; i < ARGC; i++)
print ARGV[i] }' inventory-shipped BBS-list
In this example, `ARGV[0]' contains `"awk"', `ARGV[1]' contains
`"inventory-shipped"', and `ARGV[2]' contains `"BBS-list"'. The
value of `ARGC' is 3, one more than the index of the last element
in `ARGV' since the elements are numbered from zero.
The names `ARGC' and `ARGV', as well the convention of indexing
the array from 0 to `ARGC - 1', are derived from the C language's
method of accessing command line arguments.
Notice that the `awk' program is not entered in `ARGV'. The other
special command line options, with their arguments, are also not
entered. But variable assignments on the command line *are*
treated as arguments, and do show up in the `ARGV' array.
Your program can alter `ARGC' and the elements of `ARGV'. Each
time `awk' reaches the end of an input file, it uses the next
element of `ARGV' as the name of the next input file. By storing a
different string there, your program can change which files are
read. You can use `"-"' to represent the standard input. By
storing additional elements and incrementing `ARGC' you can cause
additional files to be read.
If you decrease the value of `ARGC', that eliminates input files
from the end of the list. By recording the old value of `ARGC'
elsewhere, your program can treat the eliminated arguments as
something other than file names.
To eliminate a file from the middle of the list, store the null
string (`""') into `ARGV' in place of the file's name. As a
special feature, `awk' ignores file names that have been replaced
with the null string.
`ENVIRON'
This is an array that contains the values of the environment. The
array indices are the environment variable names; the values are
the values of the particular environment variables. For example,
`ENVIRON["HOME"]' might be `/u/close'. Changing this array does
not affect the environment passed on to any programs that `awk'
may spawn via redirection or the `system' function. (In a future
version of `gawk', it may do so.)
Some operating systems may not have environment variables. On such
systems, the array `ENVIRON' is empty.
`FILENAME'
This is the name of the file that `awk' is currently reading. If
`awk' is reading from the standard input (in other words, there
are no files listed on the command line), `FILENAME' is set to
`"-"'. `FILENAME' is changed each time a new file is read (*note
Reading Input Files: Reading Files.).
`FNR'
`FNR' is the current record number in the current file. `FNR' is
incremented each time a new record is read (*note Explicit Input
with `getline': Getline.). It is reinitialized to 0 each time a
new input file is started.
`NF' is the number of fields in the current input record. `NF' is
set each time a new record is read, when a new field is created,
or when `$0' changes (*note Examining Fields: Fields.).
This is the number of input records `awk' has processed since the
beginning of the program's execution. (*note How Input is Split
into Records: Records.). `NR' is set each time a new record is
read.
`RLENGTH'
`RLENGTH' is the length of the substring matched by the `match'
function (*note Built-in Functions for String Manipulation: String
Functions.). `RLENGTH' is set by invoking the `match' function.
Its value is the length of the matched string, or -1 if no match
was found.
`RSTART'
`RSTART' is the start-index in characters of the substring matched
by the `match' function (*note Built-in Functions for String
Manipulation: String Functions.). `RSTART' is set by invoking the
`match' function. Its value is the position of the string where
the matched substring starts, or 0 if no match was found.
File: gawk.info, Node: Command Line, Next: Language History, Prev: Built-in Variables, Up: Top
Invoking `awk'
**************
There are two ways to run `awk': with an explicit program, or with
one or more program files. Here are templates for both of them; items
enclosed in `[...]' in these templates are optional.
awk [`-FFS'] [`-W' GAWK-OPTS] [`-v VAR=VAL'] [`--'] 'PROGRAM' FILE ...
awk [`-FFS'] [`-W' GAWK-OPTS] [`-v VAR=VAL'] `-f SOURCE-FILE'
[`-f SOURCE-FILE ...'] [`--'] FILE ...
* Menu:
* Options:: Command line options and their meanings.
* Other Arguments:: Input file names and variable assignments.
* AWKPATH Variable:: Searching directories for `awk' programs.
* Obsolete:: Obsolete Options and/or features.
* Undocumented:: Undocumented Options and Features.
File: gawk.info, Node: Options, Next: Other Arguments, Prev: Command Line, Up: Command Line
Command Line Options
====================
Options begin with a minus sign, and consist of a single character.
The options and their meanings are as follows:
`-FFS'
Sets the `FS' variable to FS (*note Specifying how Fields are
Separated: Field Separators.).
`-f SOURCE-FILE'
Indicates that the `awk' program is to be found in SOURCE-FILE
instead of in the first non-option argument.
`-v VAR=VAL'
Sets the variable VAR to the value VAL *before* execution of the
program begins. Such variable values are available inside the
`BEGIN' rule (see below for a fuller explanation).
The `-v' option can only set one variable, but you can use it more
than once, setting another variable each time, like this:
`-v foo=1 -v bar=2'.
`-W GAWK-OPT'
Following the POSIX standard, options that are specific to `gawk'
are supplied as arguments to the `-W' option. These arguments may
be separated by commas, or quoted and separated by whitespace.
Case is ignored when processing these options. The following
options are available:
`compat'
Specifies "compatibility mode", in which the GNU extensions in
`gawk' are disabled, so that `gawk' behaves just like Unix
`awk'. *Note Extensions in `gawk' not in POSIX `awk':
POSIX/GNU, which summarizes the extensions. Also see *Note
Downward Compatibility and Debugging: Compatibility Mode.
`lint'
Provide warnings about constructs that are dubious or
non-portable to other `awk' implementations.
`copyleft'
`copyright'
Print the short version of the General Public License. This
option may disappear in a future version of `gawk'.
`posix'
Operate in strict POSIX mode. This disables all `gawk'
extensions (just like `compat'), and adds the following
additional restrictions:
* `\x' escape sequences are not recognized (*note Constant
Expressions: Constants.).
* The synonym `func' for the keyword `function' is not
recognized (*note Syntax of Function Definitions:
Definition Syntax.).
* The operators `**' and `**=' cannot be used in place of
`^' and `^=' (*note Arithmetic Operators: Arithmetic
Ops., and also *note Assignment Expressions: Assignment
Ops.).
* Specifying `-Ft' on the command line does not set the
value of `FS' to be a single tab character (*note
Specifying how Fields are Separated: Field Separators.).
Although you can supply both `-W compat' and `-W posix' on the
command line, `-W posix' will take precedence.
`version'
Prints version information for this particular copy of `gawk'.
This is so you can determine if your copy of `gawk' is up to
date with respect to whatever the Free Software Foundation is
currently distributing. This option may disappear in a
future version of `gawk'.
Signals the end of the command line options. The following
arguments are not treated as options even if they begin with `-'.
This interpretation of `--' follows the POSIX argument parsing
conventions.
This is useful if you have file names that start with `-', or in
shell scripts, if you have file names that will be specified by
the user which could start with `-'.
The `-a', `-e', `-c', `-C', and `-V' options of `gawk' version
2.11.1 are recognized, but produce a warning message. They will go
away in the next major release of `gawk'.
Any other options are flagged as invalid with a warning message, but
are otherwise ignored.
In compatibility mode, as a special case, if the value of FS supplied
to the `-F' option is `t', then `FS' is set to the tab character
(`"\t"'). This is only true for `-W compat', and not for `-W posix'
(*note Specifying how Fields are Separated: Field Separators.).
If the `-f' option is *not* used, then the first non-option command
line argument is expected to be the program text.
The `-f' option may be used more than once on the command line. Then
`awk' reads its program source from all of the named files, as if they
had been concatenated together into one big file. This is useful for
creating libraries of `awk' functions. Useful functions can be written
once, and then retrieved from a standard place, instead of having to be
included into each individual program. You can still type in a program
at the terminal and use library functions, by specifying `-f /dev/tty'.
`awk' will read a file from the terminal to use as part of the `awk'
program. After typing your program, type `Control-d' (the end-of-file
character) to terminate it. (You may also use `-f -' to read program
source from the standard input, but then you will not be able to also
use the standard input as a source of data.)
File: gawk.info, Node: Other Arguments, Next: AWKPATH Variable, Prev: Options, Up: Command Line
Other Command Line Arguments
============================
Any additional arguments on the command line are normally treated as
input files to be processed in the order specified. However, an
argument that has the form `VAR=VALUE', means to assign the value VALUE
to the variable VAR--it does not specify a file at all.
All these arguments are made available to your `awk' program in the
`ARGV' array (*note Built-in Variables::.). Command line options and
the program text (if present) are omitted from the `ARGV' array. All
other arguments, including variable assignments, are included.
The distinction between file name arguments and variable-assignment
arguments is made when `awk' is about to open the next input file. At
that point in execution, it checks the "file name" to see whether it is
really a variable assignment; if so, `awk' sets the variable instead of
reading a file.
Therefore, the variables actually receive the specified values after
all previously specified files have been read. In particular, the
values of variables assigned in this fashion are *not* available inside
a `BEGIN' rule (*note `BEGIN' and `END' Special Patterns: BEGIN/END.),
since such rules are run before `awk' begins scanning the argument list.
The values given on the command line are processed for escape sequences
(*note Constant Expressions: Constants.).
In some earlier implementations of `awk', when a variable assignment
occurred before any file names, the assignment would happen *before*
the `BEGIN' rule was executed. Some applications came to depend upon
this "feature." When `awk' was changed to be more consistent, the `-v'
option was added to accommodate applications that depended upon this
old behavior.
The variable assignment feature is most useful for assigning to
variables such as `RS', `OFS', and `ORS', which control input and
output formats, before scanning the data files. It is also useful for
controlling state if multiple passes are needed over a data file. For
example:
awk 'pass == 1 { PASS 1 STUFF }
pass == 2 { PASS 2 STUFF }' pass=1 datafile pass=2 datafile
Given the variable assignment feature, the `-F' option is not
strictly necessary. It remains for historical compatibility.
File: gawk.info, Node: AWKPATH Variable, Next: Obsolete, Prev: Other Arguments, Up: Command Line
The `AWKPATH' Environment Variable
==================================
The previous section described how `awk' program files can be named
on the command line with the `-f' option. In some `awk'
implementations, you must supply a precise path name for each program
file, unless the file is in the current directory.
But in `gawk', if the file name supplied in the `-f' option does not
contain a `/', then `gawk' searches a list of directories (called the
"search path"), one by one, looking for a file with the specified name.
The search path is actually a string consisting of directory names
separated by colons. `gawk' gets its search path from the `AWKPATH'
environment variable. If that variable does not exist, `gawk' uses the
default path, which is `.:/usr/lib/awk:/usr/local/lib/awk'. (Programs
written by system administrators should use an `AWKPATH' variable that
does not include the current directory, `.'.)
The search path feature is particularly useful for building up
libraries of useful `awk' functions. The library files can be placed
in a standard directory that is in the default path, and then specified
on the command line with a short file name. Otherwise, the full file
name would have to be typed for each file.
Path searching is not done if `gawk' is in compatibility mode. This
is true for both `-W compat' and `-W posix'. *Note Invoking `awk':
Command Line.
*Note:* if you want files in the current directory to be found, you
must include the current directory in the path, either by writing `.'
as an entry in the path, or by writing a null entry in the path. (A
null entry is indicated by starting or ending the path with a colon, or
by placing two colons next to each other (`::').) If the current
directory is not included in the path, then files cannot be found in
the current directory. This path search mechanism is identical to the
shell's.
File: gawk.info, Node: Obsolete, Next: Undocumented, Prev: AWKPATH Variable, Up: Command Line
Obsolete Options and/or Features
================================
This section describes features and/or command line options from the
previous release of `gawk' that are either not available in the current
version, or that are still supported but deprecated (meaning that they
will *not* be in the next release).
For version 2.14 of `gawk', the following command line options are
recognized, but produce a warning message (*note Invoking `awk':
Command Line.).
Use `-W compat' instead.
Use `-W version' instead.
Use `-W copyright' instead.
These options produce a warning message but have no effect on the
execution of `gawk'. The POSIX standard now specifies traditional
`awk' regular expressions for the `awk' utility.
The public-domain version of `strftime' that is distributed with
`gawk' changed for the 2.14 release. The `%V' conversion specifier
that used to generate the date in VMS format was changed to `%v'. This
is because the POSIX standard for the `date' utility now specifies a
`%V' conversion specifier. *Note Functions for Dealing with Time
Stamps: Time Functions, for details.
File: gawk.info, Node: Undocumented, Prev: Obsolete, Up: Command Line
Undocumented Options and Features
=================================
This section intentionally left blank.
File: gawk.info, Node: Language History, Next: Installation, Prev: Command Line, Up: Top
The Evolution of the `awk' Language
***********************************
This manual describes the GNU implementation of `awk', which is
patterned after the POSIX specification. Many `awk' users are only
familiar with the original `awk' implementation in Version 7 Unix,
which is also the basis for the version in Berkeley Unix (through
4.3--Reno). This chapter briefly describes the evolution of the `awk'
language.
* Menu:
* V7/S5R3.1:: The major changes between V7 and
System V Release 3.1.
* S5R4:: Minor changes between System V
Releases 3.1 and 4.
* POSIX:: New features from the POSIX standard.
* POSIX/GNU:: The extensions in `gawk'
not in POSIX `awk'.
File: gawk.info, Node: V7/S5R3.1, Next: S5R4, Prev: Language History, Up: Language History
Major Changes between V7 and S5R3.1
===================================
The `awk' language evolved considerably between the release of
Version 7 Unix (1978) and the new version first made widely available in
System V Release 3.1 (1987). This section summarizes the changes, with
cross-references to further details.
* The requirement for `;' to separate rules on a line (*note `awk'
Statements versus Lines: Statements/Lines.).
* User-defined functions, and the `return' statement (*note
User-defined Functions: User-defined.).
* The `delete' statement (*note The `delete' Statement: Delete.).
* The `do'-`while' statement (*note The `do'-`while' Statement: Do
Statement.).
* The built-in functions `atan2', `cos', `sin', `rand' and `srand'
(*note Numeric Built-in Functions: Numeric Functions.).
* The built-in functions `gsub', `sub', and `match' (*note Built-in
Functions for String Manipulation: String Functions.).
* The built-in functions `close', which closes an open file, and
`system', which allows the user to execute operating system
commands (*note Built-in Functions for Input/Output: I/O
Functions.).
* The `ARGC', `ARGV', `FNR', `RLENGTH', `RSTART', and `SUBSEP'
built-in variables (*note Built-in Variables::.).
* The conditional expression using the operators `?' and `:' (*note
Conditional Expressions: Conditional Exp.).
* The exponentiation operator `^' (*note Arithmetic Operators:
Arithmetic Ops.) and its assignment operator form `^=' (*note
Assignment Expressions: Assignment Ops.).
* C-compatible operator precedence, which breaks some old `awk'
programs (*note Operator Precedence (How Operators Nest):
Precedence.).
* Regexps as the value of `FS' (*note Specifying how Fields are
Separated: Field Separators.), and as the third argument to the
`split' function (*note Built-in Functions for String
Manipulation: String Functions.).
* Dynamic regexps as operands of the `~' and `!~' operators (*note
How to Use Regular Expressions: Regexp Usage.).
* Escape sequences (*note Constant Expressions: Constants.) in
regexps.
* The escape sequences `\b', `\f', and `\r' (*note Constant
Expressions: Constants.).
* Redirection of input for the `getline' function (*note Explicit
Input with `getline': Getline.).
* Multiple `BEGIN' and `END' rules (*note `BEGIN' and `END' Special
Patterns: BEGIN/END.).
* Simulated multi-dimensional arrays (*note Multi-dimensional
Arrays: Multi-dimensional.).
File: gawk.info, Node: S5R4, Next: POSIX, Prev: V7/S5R3.1, Up: Language History
Changes between S5R3.1 and S5R4
===============================
The System V Release 4 version of Unix `awk' added these features
(some of which originated in `gawk'):
* The `ENVIRON' variable (*note Built-in Variables::.).
* Multiple `-f' options on the command line (*note Invoking `awk':
Command Line.).
* The `-v' option for assigning variables before program execution
begins (*note Invoking `awk': Command Line.).
* The `--' option for terminating command line options.
* The `\a', `\v', and `\x' escape sequences (*note Constant
Expressions: Constants.).
* A defined return value for the `srand' built-in function (*note
Numeric Built-in Functions: Numeric Functions.).
* The `toupper' and `tolower' built-in string functions for case
translation (*note Built-in Functions for String Manipulation:
String Functions.).
* A cleaner specification for the `%c' format-control letter in the
`printf' function (*note Using `printf' Statements for Fancier
Printing: Printf.).
* The ability to dynamically pass the field width and precision
(`"%*.*d"') in the argument list of the `printf' function (*note
Using `printf' Statements for Fancier Printing: Printf.).
* The use of constant regexps such as `/foo/' as expressions, where
they are equivalent to use of the matching operator, as in `$0 ~
/foo/' (*note Constant Expressions: Constants.).
File: gawk.info, Node: POSIX, Next: POSIX/GNU, Prev: S5R4, Up: Language History
Changes between S5R4 and POSIX `awk'
====================================
The POSIX Command Language and Utilities standard for `awk'
introduced the following changes into the language:
* The use of `-W' for implementation-specific options.
* The use of `CONVFMT' for controlling the conversion of numbers to
strings (*note Conversion of Strings and Numbers: Conversion.).
* The concept of a numeric string, and tighter comparison rules to go
with it (*note Comparison Expressions: Comparison Ops.).
* More complete documentation of many of the previously undocumented
features of the language.
File: gawk.info, Node: POSIX/GNU, Prev: POSIX, Up: Language History
Extensions in `gawk' not in POSIX `awk'
=======================================
The GNU implementation, `gawk', adds these features:
* The `AWKPATH' environment variable for specifying a path search for
the `-f' command line option (*note Invoking `awk': Command Line.).
* The various `gawk' specific features available via the `-W'
command line option (*note Invoking `awk': Command Line.).
* The `IGNORECASE' variable and its effects (*note Case-sensitivity
in Matching: Case-sensitivity.).
* The `FIELDWIDTHS' variable and its effects (*note Reading
Fixed-width Data: Constant Size.).
* The `next file' statement for skipping to the next data file
(*note The `next file' Statement: Next File Statement.).
* The `systime' and `strftime' built-in functions for obtaining and
printing time stamps (*note Functions for Dealing with Time
Stamps: Time Functions.).
* The `/dev/stdin', `/dev/stdout', `/dev/stderr', and `/dev/fd/N'
file name interpretation (*note Standard I/O Streams: Special
Files.).
* The `-W compat' option to turn off these extensions (*note
Invoking `awk': Command Line.).
* The `-W posix' option for full POSIX compliance (*note Invoking
`awk': Command Line.).
File: gawk.info, Node: Installation, Next: Gawk Summary, Prev: Language History, Up: Top
Installing `gawk'
*****************
This chapter provides instructions for installing `gawk' on the
various platforms that are supported by the developers. The primary
developers support Unix (and one day, GNU), while the other ports were
contributed. The file `ACKNOWLEDGMENT' in the `gawk' distribution
lists the electronic mail addresses of the people who did the
respective ports.
* Menu:
* Gawk Distribution:: What is in the `gawk' distribution.
* Unix Installation:: Installing `gawk' under various versions
of Unix.
* VMS Installation:: Installing `gawk' on VMS.
* MS-DOS Installation:: Installing `gawk' on MS-DOS.
* Atari Installation:: Installing `gawk' on the Atari ST.
File: gawk.info, Node: Gawk Distribution, Next: Unix Installation, Prev: Installation, Up: Installation
The `gawk' Distribution
=======================
This section first describes how to get and extract the `gawk'
distribution, and then discusses what is in the various files and
subdirectories.
* Menu:
* Extracting:: How to get and extract the distribution.
* Distribution contents:: What is in the distribution.